-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
125 fix germline variant calling #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this generally seems fine, and solves the immediate problem as I understand it, as long as the inputs are set properly (see above comment about defaults).
Has this actually been tested on some data and shown to have the desired effect?
I do wonder if this germline calling should be reviewed and brought up to date with the latest and greatest GATK recommendations, but that's a separate issue!
definitions/immuno.wdl
Outdated
@@ -199,6 +199,7 @@ workflow immuno { | |||
Array[Array[String]] gatk_haplotypecaller_intervals | |||
Int? ploidy | |||
String? optitype_name | |||
Float germline_filter_gnomAD_maximum_population_allele_frequency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should immuno go ahead and set a default at the top level of 1.0 so that no variants get filtered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing now! I left germline_filter_gnomAD_maximum_population_allele_frequency set to 0.05 in germline_detect_variants. Would it be more correct style wise to set that value at the top level and have it be passed down?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our general style is to define the defaults as low down as possible (at the tool level), and then pass parameters upward as needed so that they can be overridden (with the override being set as high up as possible, so we don't have to go digging through 8 layers to find it). In the top-level immuno wdl, I think it would be reasonable to set that value such that it always passes all germline variants, since various steps in that workflow depend on it.
The change has been tested with germline_filter_gnomAD_maximum_population_allele_frequency set at three separate values: 1.1 (which results in no filtering), .05, and .01 (which results in extreme filtering). To verify that these changes took place, I viewed the intermediate files between the gnomadFrequencyFilter step and the setFilteredVcfName (within germline_filter_vcf.wdl). After investigating these files I found that the run with the filter of 1.1 resulted in files that were the exact same size with the same values (as expected) while the run with the filter of 0.01 had a file that was significantly smaller in size than the pre-filtering file, removing all entries with a gnomAD AF above 0.01. Users can now set the value within the yaml file or choose not to set it, in which the default value will be 1.1 resulting in no filtering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a few files in this PR ended up with mode changes.
Those files have now been fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 📇
This PR is related to issue #125
Enables the user to override the default for filtering of germline SNPs